On November 27, 1895, Alfred Nobel signed his last will in Paris. When it was opened after his death, the will caused a lot of controversy, as Nobel had left much of his wealth for the establishment of a prize.
Alfred Nobel dictates that his entire remaining estate should be used to endow “prizes to those who, during the preceding year, have conferred the greatest benefit to humankind”.
Every year the Nobel Prize is given to scientists and scholars in the categories chemistry, literature, physics, physiology or medicine, economics, and peace.

Let's see what patterns we can find in the data of the past Nobel laureates. What can we learn about the Nobel prize and our world more generally?
import pandas as pd
import numpy as np
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
pd.options.display.float_format = '{:,.2f}'.format
df_data = pd.read_csv('nobel_prize_data.csv')
df_data.head()
| year | category | prize | motivation | prize_share | laureate_type | full_name | birth_date | birth_city | birth_country | birth_country_current | sex | organization_name | organization_city | organization_country | ISO | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1901 | Chemistry | The Nobel Prize in Chemistry 1901 | "in recognition of the extraordinary services ... | 1/1 | Individual | Jacobus Henricus van 't Hoff | 1852-08-30 | Rotterdam | Netherlands | Netherlands | Male | Berlin University | Berlin | Germany | NLD |
| 1 | 1901 | Literature | The Nobel Prize in Literature 1901 | "in special recognition of his poetic composit... | 1/1 | Individual | Sully Prudhomme | 1839-03-16 | Paris | France | France | Male | NaN | NaN | NaN | FRA |
| 2 | 1901 | Medicine | The Nobel Prize in Physiology or Medicine 1901 | "for his work on serum therapy, especially its... | 1/1 | Individual | Emil Adolf von Behring | 1854-03-15 | Hansdorf (Lawice) | Prussia (Poland) | Poland | Male | Marburg University | Marburg | Germany | POL |
| 3 | 1901 | Peace | The Nobel Peace Prize 1901 | NaN | 1/2 | Individual | Frédéric Passy | 1822-05-20 | Paris | France | France | Male | NaN | NaN | NaN | FRA |
| 4 | 1901 | Peace | The Nobel Peace Prize 1901 | NaN | 1/2 | Individual | Jean Henry Dunant | 1828-05-08 | Geneva | Switzerland | Switzerland | Male | NaN | NaN | NaN | CHE |
Caveats: The exact birth dates for Michael Houghton, Venkatraman Ramakrishnan, and Nadia Murad are unknown. I've substituted them with mid-year estimate of July 2nd.
# no. of rows and columns
df_data.shape
(962, 16)
# column names
df_data.columns
Index(['year', 'category', 'prize', 'motivation', 'prize_share',
'laureate_type', 'full_name', 'birth_date', 'birth_city',
'birth_country', 'birth_country_current', 'sex', 'organization_name',
'organization_city', 'organization_country', 'ISO'],
dtype='object')
# sort the award by year (from earliest)
df_data.sort_values('year')
| year | category | prize | motivation | prize_share | laureate_type | full_name | birth_date | birth_city | birth_country | birth_country_current | sex | organization_name | organization_city | organization_country | ISO | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1901 | Chemistry | The Nobel Prize in Chemistry 1901 | "in recognition of the extraordinary services ... | 1/1 | Individual | Jacobus Henricus van 't Hoff | 1852-08-30 | Rotterdam | Netherlands | Netherlands | Male | Berlin University | Berlin | Germany | NLD |
| 1 | 1901 | Literature | The Nobel Prize in Literature 1901 | "in special recognition of his poetic composit... | 1/1 | Individual | Sully Prudhomme | 1839-03-16 | Paris | France | France | Male | NaN | NaN | NaN | FRA |
| 2 | 1901 | Medicine | The Nobel Prize in Physiology or Medicine 1901 | "for his work on serum therapy, especially its... | 1/1 | Individual | Emil Adolf von Behring | 1854-03-15 | Hansdorf (Lawice) | Prussia (Poland) | Poland | Male | Marburg University | Marburg | Germany | POL |
| 3 | 1901 | Peace | The Nobel Peace Prize 1901 | NaN | 1/2 | Individual | Frédéric Passy | 1822-05-20 | Paris | France | France | Male | NaN | NaN | NaN | FRA |
| 4 | 1901 | Peace | The Nobel Peace Prize 1901 | NaN | 1/2 | Individual | Jean Henry Dunant | 1828-05-08 | Geneva | Switzerland | Switzerland | Male | NaN | NaN | NaN | CHE |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 951 | 2020 | Chemistry | The Nobel Prize in Chemistry 2020 | “for the development of a method for genome ed... | 1/2 | Individual | Jennifer A. Doudna | 1964-02-19 | Washington, DC | United States of America | United States of America | Female | University of California | Berkeley, CA | United States of America | USA |
| 950 | 2020 | Chemistry | The Nobel Prize in Chemistry 2020 | “for the development of a method for genome ed... | 1/2 | Individual | Emmanuelle Charpentier | 1968-12-11 | Juvisy-sur-Orge | France | France | Female | Max-Planck-Institut | Berlin | Germany | FRA |
| 960 | 2020 | Physics | The Nobel Prize in Physics 2020 | “for the discovery of a supermassive compact o... | 1/4 | Individual | Reinhard Genzel | 1952-03-24 | Bad Homburg vor der Höhe | Germany | Germany | Male | University of California | Los Angeles, CA | United States of America | DEU |
| 954 | 2020 | Literature | The Nobel Prize in Literature 2020 | “for her unmistakable poetic voice that with a... | 1/1 | Individual | Louise Glück | 1943-04-22 | New York, NY | United States of America | United States of America | Female | NaN | NaN | NaN | USA |
| 961 | 2020 | Physics | The Nobel Prize in Physics 2020 | “for the discovery that black hole formation i... | 1/2 | Individual | Roger Penrose | 1931-08-08 | Colchester | United Kingdom | United Kingdom | Male | University of Oxford | Oxford | United Kingdom | GBR |
962 rows × 16 columns
df_data.duplicated().any()
False
df_data.isna().sum()
year 0 category 0 prize 0 motivation 88 prize_share 0 laureate_type 0 full_name 0 birth_date 28 birth_city 31 birth_country 28 birth_country_current 28 sex 28 organization_name 255 organization_city 255 organization_country 254 ISO 28 dtype: int64
col_subset = ['year','category', 'laureate_type',
'birth_date','full_name', 'organization_name']
df_data.loc[df_data.birth_date.isna()][col_subset]
| year | category | laureate_type | birth_date | full_name | organization_name | |
|---|---|---|---|---|---|---|
| 24 | 1904 | Peace | Organization | NaN | Institut de droit international (Institute of ... | NaN |
| 60 | 1910 | Peace | Organization | NaN | Bureau international permanent de la Paix (Per... | NaN |
| 89 | 1917 | Peace | Organization | NaN | Comité international de la Croix Rouge (Intern... | NaN |
| 200 | 1938 | Peace | Organization | NaN | Office international Nansen pour les Réfugiés ... | NaN |
| 215 | 1944 | Peace | Organization | NaN | Comité international de la Croix Rouge (Intern... | NaN |
| 237 | 1947 | Peace | Organization | NaN | American Friends Service Committee (The Quakers) | NaN |
| 238 | 1947 | Peace | Organization | NaN | Friends Service Council (The Quakers) | NaN |
| 283 | 1954 | Peace | Organization | NaN | Office of the United Nations High Commissioner... | NaN |
| 348 | 1963 | Peace | Organization | NaN | Comité international de la Croix Rouge (Intern... | NaN |
| 349 | 1963 | Peace | Organization | NaN | Ligue des Sociétés de la Croix-Rouge (League o... | NaN |
| 366 | 1965 | Peace | Organization | NaN | United Nations Children's Fund (UNICEF) | NaN |
| 399 | 1969 | Peace | Organization | NaN | International Labour Organization (I.L.O.) | NaN |
| 479 | 1977 | Peace | Organization | NaN | Amnesty International | NaN |
| 523 | 1981 | Peace | Organization | NaN | Office of the United Nations High Commissioner... | NaN |
| 558 | 1985 | Peace | Organization | NaN | International Physicians for the Prevention of... | NaN |
| 588 | 1988 | Peace | Organization | NaN | United Nations Peacekeeping Forces | NaN |
| 659 | 1995 | Peace | Organization | NaN | Pugwash Conferences on Science and World Affairs | NaN |
| 682 | 1997 | Peace | Organization | NaN | International Campaign to Ban Landmines (ICBL) | NaN |
| 703 | 1999 | Peace | Organization | NaN | Médecins Sans Frontières | NaN |
| 730 | 2001 | Peace | Organization | NaN | United Nations (U.N.) | NaN |
| 778 | 2005 | Peace | Organization | NaN | International Atomic Energy Agency (IAEA) | NaN |
| 788 | 2006 | Peace | Organization | NaN | Grameen Bank | NaN |
| 801 | 2007 | Peace | Organization | NaN | Intergovernmental Panel on Climate Change (IPCC) | NaN |
| 860 | 2012 | Peace | Organization | NaN | European Union (EU) | NaN |
| 873 | 2013 | Peace | Organization | NaN | Organisation for the Prohibition of Chemical W... | NaN |
| 897 | 2015 | Peace | Organization | NaN | National Dialogue Quartet | NaN |
| 919 | 2017 | Peace | Organization | NaN | International Campaign to Abolish Nuclear Weap... | NaN |
| 958 | 2020 | Peace | Organization | NaN | World Food Programme (WFP) | NaN |
col_subset = ['year','category', 'laureate_type','full_name', 'organization_name']
df_data.loc[df_data.organization_name.isna()][col_subset]
| year | category | laureate_type | full_name | organization_name | |
|---|---|---|---|---|---|
| 1 | 1901 | Literature | Individual | Sully Prudhomme | NaN |
| 3 | 1901 | Peace | Individual | Frédéric Passy | NaN |
| 4 | 1901 | Peace | Individual | Jean Henry Dunant | NaN |
| 7 | 1902 | Literature | Individual | Christian Matthias Theodor Mommsen | NaN |
| 9 | 1902 | Peace | Individual | Charles Albert Gobat | NaN |
| ... | ... | ... | ... | ... | ... |
| 932 | 2018 | Peace | Individual | Nadia Murad | NaN |
| 942 | 2019 | Literature | Individual | Peter Handke | NaN |
| 946 | 2019 | Peace | Individual | Abiy Ahmed Ali | NaN |
| 954 | 2020 | Literature | Individual | Louise Glück | NaN |
| 958 | 2020 | Peace | Organization | World Food Programme (WFP) | NaN |
255 rows × 5 columns
df_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 962 entries, 0 to 961 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 962 non-null int64 1 category 962 non-null object 2 prize 962 non-null object 3 motivation 874 non-null object 4 prize_share 962 non-null object 5 laureate_type 962 non-null object 6 full_name 962 non-null object 7 birth_date 934 non-null object 8 birth_city 931 non-null object 9 birth_country 934 non-null object 10 birth_country_current 934 non-null object 11 sex 934 non-null object 12 organization_name 707 non-null object 13 organization_city 707 non-null object 14 organization_country 708 non-null object 15 ISO 934 non-null object dtypes: int64(1), object(15) memory usage: 120.4+ KB
df_data.birth_date = pd.to_datetime(df_data.birth_date)
separated_values = df_data.prize_share.str.split('/', expand=True) # Expand the split strings into separate columns
numerator = pd.to_numeric(separated_values[0])
denomenator = pd.to_numeric(separated_values[1])
df_data['share_pct'] = numerator / denomenator
df_data.share_pct
0 1.00
1 1.00
2 1.00
3 0.50
4 0.50
...
957 0.33
958 1.00
959 0.25
960 0.25
961 0.50
Name: share_pct, Length: 962, dtype: float64
df_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 962 entries, 0 to 961 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 962 non-null int64 1 category 962 non-null object 2 prize 962 non-null object 3 motivation 874 non-null object 4 prize_share 962 non-null object 5 laureate_type 962 non-null object 6 full_name 962 non-null object 7 birth_date 934 non-null datetime64[ns] 8 birth_city 931 non-null object 9 birth_country 934 non-null object 10 birth_country_current 934 non-null object 11 sex 934 non-null object 12 organization_name 707 non-null object 13 organization_city 707 non-null object 14 organization_country 708 non-null object 15 ISO 934 non-null object 16 share_pct 962 non-null float64 dtypes: datetime64[ns](1), float64(1), int64(1), object(14) memory usage: 127.9+ KB
biology = df_data.sex.value_counts()
fig = px.pie(labels=biology.index,
values=biology.values,
title="Percentage of Male vs. Female Winners",
names=biology.index,
hole=0.4,)
fig.update_traces(textposition='inside', textfont_size=15, textinfo='percent')
fig.show()
is_winner = df_data.duplicated(subset=['full_name'], keep=False)
multiple_winners = df_data[is_winner]
print(f'There are {multiple_winners.full_name.nunique()}' \
' winners who were awarded the prize more than once.')
There are 6 winners who were awarded the prize more than once.
col_subset = ['year', 'category', 'laureate_type', 'full_name']
multiple_winners[col_subset]
| year | category | laureate_type | full_name | |
|---|---|---|---|---|
| 18 | 1903 | Physics | Individual | Marie Curie, née Sklodowska |
| 62 | 1911 | Chemistry | Individual | Marie Curie, née Sklodowska |
| 89 | 1917 | Peace | Organization | Comité international de la Croix Rouge (Intern... |
| 215 | 1944 | Peace | Organization | Comité international de la Croix Rouge (Intern... |
| 278 | 1954 | Chemistry | Individual | Linus Carl Pauling |
| 283 | 1954 | Peace | Organization | Office of the United Nations High Commissioner... |
| 297 | 1956 | Physics | Individual | John Bardeen |
| 306 | 1958 | Chemistry | Individual | Frederick Sanger |
| 340 | 1962 | Peace | Individual | Linus Carl Pauling |
| 348 | 1963 | Peace | Organization | Comité international de la Croix Rouge (Intern... |
| 424 | 1972 | Physics | Individual | John Bardeen |
| 505 | 1980 | Chemistry | Individual | Frederick Sanger |
| 523 | 1981 | Peace | Organization | Office of the United Nations High Commissioner... |
df_data.category.nunique()
6
prizes_per_category = df_data.category.value_counts()
v_bar = px.bar(
x = prizes_per_category.index,
y = prizes_per_category.values,
color = prizes_per_category.values,
color_continuous_scale='Aggrnyl',
title='Number of Prizes Awarded per Category')
v_bar.update_layout(xaxis_title='Nobel Prize Category',
coloraxis_showscale=False,
yaxis_title='Number of Prizes')
v_bar.show()
cat_men_women = df_data.groupby(['category', 'sex'],
as_index=False).agg({'prize': pd.Series.count})
cat_men_women.sort_values('prize', ascending=False, inplace=True)
v_bar_split = px.bar(x = cat_men_women.category,
y = cat_men_women.prize,
color = cat_men_women.sex,
title='Number of Prizes Awarded per Category split by Men and Women')
v_bar_split.update_layout(xaxis_title='Nobel Prize Category',
yaxis_title='Number of Prizes')
v_bar_split.show()
prize_per_year = df_data.groupby(by='year').count().prize
moving_average = prize_per_year.rolling(window=5).mean()
np.arange(1900, 2021, step=5)
array([1900, 1905, 1910, 1915, 1920, 1925, 1930, 1935, 1940, 1945, 1950,
1955, 1960, 1965, 1970, 1975, 1980, 1985, 1990, 1995, 2000, 2005,
2010, 2015, 2020])
plt.figure(figsize=(16,8), dpi=200)
plt.title('Number of Nobel Prizes Awarded per Year', fontsize=18)
plt.yticks(fontsize=14)
plt.xticks(ticks=np.arange(1900, 2021, step=5),
fontsize=14,
rotation=45)
ax = plt.gca() # get current axis
ax.set_xlim(1900, 2020)
ax.scatter(x=prize_per_year.index,
y=prize_per_year.values,
c='dodgerblue',
alpha=0.7,
s=100,)
ax.plot(prize_per_year.index,
moving_average.values,
c='crimson',
linewidth=3,)
plt.show()
yearly_avg_share = df_data.groupby(by='year').agg({'share_pct': pd.Series.mean})
share_moving_average = yearly_avg_share.rolling(window=5).mean()
plt.figure(figsize=(16,8), dpi=200)
plt.title('Number of Nobel Prizes Awarded per Year', fontsize=18)
plt.yticks(fontsize=14)
plt.xticks(ticks=np.arange(1900, 2021, step=5),
fontsize=14,
rotation=45)
ax1 = plt.gca()
ax2 = ax1.twinx()
ax1.set_xlim(1900, 2020)
# invert the second axis
ax2.invert_yaxis()
ax1.scatter(x=prize_per_year.index,
y=prize_per_year.values,
c='dodgerblue',
alpha=0.7,
s=100,)
ax1.plot(prize_per_year.index,
moving_average.values,
c='crimson',
linewidth=3,)
# Adding prize share plot on second axis
ax2.plot(prize_per_year.index,
share_moving_average.values,
c='grey',
linewidth=3,)
plt.show()
top_countries = df_data.groupby(['birth_country_current'],
as_index=False).agg({'prize': pd.Series.count})
top_countries.sort_values(by='prize', inplace=True)
top20_countries = top_countries[-20:]
h_bar = px.bar(x=top20_countries.prize,
y=top20_countries.birth_country_current,
orientation='h',
color=top20_countries.prize,
color_continuous_scale='Viridis',
title='Top 20 Countries by Number of Prizes')
h_bar.update_layout(xaxis_title='Number of Prizes',
yaxis_title='Country',
coloraxis_showscale=False)
h_bar.show()
df_countries = df_data.groupby(['birth_country_current', 'ISO'],
as_index=False).agg({'prize': pd.Series.count})
df_countries.sort_values('prize', ascending=False)
| birth_country_current | ISO | prize | |
|---|---|---|---|
| 74 | United States of America | USA | 281 |
| 73 | United Kingdom | GBR | 105 |
| 26 | Germany | DEU | 84 |
| 25 | France | FRA | 57 |
| 67 | Sweden | SWE | 29 |
| ... | ... | ... | ... |
| 32 | Iceland | ISL | 1 |
| 47 | Madagascar | MDG | 1 |
| 34 | Indonesia | IDN | 1 |
| 36 | Iraq | IRQ | 1 |
| 78 | Zimbabwe | ZWE | 1 |
79 rows × 3 columns
world_map = px.choropleth(df_countries,
locations='ISO',
color='prize',
hover_name='birth_country_current',
color_continuous_scale=px.colors.sequential.matter)
world_map.update_layout(coloraxis_showscale=True,)
world_map.show()
cat_country = df_data.groupby(['birth_country_current', 'category'],
as_index=False).agg({'prize': pd.Series.count})
cat_country.sort_values(by='prize', ascending=False, inplace=True)
merged_df = pd.merge(cat_country, top20_countries, on='birth_country_current')
# change column names
merged_df.columns = ['birth_country_current', 'category', 'cat_prize', 'total_prize']
merged_df.sort_values(by='total_prize', inplace=True)
cat_cntry_bar = px.bar(x=merged_df.cat_prize,
y=merged_df.birth_country_current,
color=merged_df.category,
orientation='h',
title='Top 20 Countries by Number of Prizes and Category')
cat_cntry_bar.update_layout(xaxis_title='Number of Prizes',
yaxis_title='Country')
cat_cntry_bar.show()
prize_by_year = df_data.groupby(by=['birth_country_current', 'year'], as_index=False).count()
prize_by_year = prize_by_year.sort_values('year')[['year', 'birth_country_current', 'prize']]
cumulative_prizes = prize_by_year.groupby(by=['birth_country_current',
'year']).sum().groupby(level=[0]).cumsum()
cumulative_prizes.reset_index(inplace=True)
l_chart = px.line(cumulative_prizes,
x='year',
y='prize',
color='birth_country_current',
hover_name='birth_country_current')
l_chart.update_layout(xaxis_title='Year',
yaxis_title='Number of Prizes')
l_chart.show()
top20_orgs = df_data.organization_name.value_counts()[:20]
top20_orgs.sort_values(ascending=True, inplace=True)
org_bar = px.bar(x = top20_orgs.values,
y = top20_orgs.index,
orientation='h',
color=top20_orgs.values,
color_continuous_scale=px.colors.sequential.haline,
title='Top 20 Research Institutions by Number of Prizes')
org_bar.update_layout(xaxis_title='Number of Prizes',
yaxis_title='Institution',
coloraxis_showscale=False)
org_bar.show()
top20_org_cities = df_data.organization_city.value_counts()[:20]
top20_org_cities.sort_values(ascending=True, inplace=True)
city_bar2 = px.bar(x = top20_org_cities.values,
y = top20_org_cities.index,
orientation='h',
color=top20_org_cities.values,
color_continuous_scale=px.colors.sequential.Plasma,
title='Which Cities Do the Most Research?')
city_bar2.update_layout(xaxis_title='Number of Prizes',
yaxis_title='City',
coloraxis_showscale=False)
city_bar2.show()
top20_cities = df_data.birth_city.value_counts()[:20]
top20_cities.sort_values(ascending=True, inplace=True)
city_bar = px.bar(x=top20_cities.values,
y=top20_cities.index,
orientation='h',
color=top20_cities.values,
color_continuous_scale=px.colors.sequential.Plasma,
title='Where were the Nobel Laureates Born?')
city_bar.update_layout(xaxis_title='Number of Prizes',
yaxis_title='City of Birth',
coloraxis_showscale=False)
city_bar.show()
country_city_org = df_data.groupby(by=['organization_country',
'organization_city',
'organization_name'], as_index=False).agg({'prize': pd.Series.count})
country_city_org = country_city_org.sort_values('prize', ascending=False)
country_city_org
| organization_country | organization_city | organization_name | prize | |
|---|---|---|---|---|
| 205 | United States of America | Cambridge, MA | Harvard University | 29 |
| 280 | United States of America | Stanford, CA | Stanford University | 23 |
| 206 | United States of America | Cambridge, MA | Massachusetts Institute of Technology (MIT) | 21 |
| 209 | United States of America | Chicago, IL | University of Chicago | 20 |
| 195 | United States of America | Berkeley, CA | University of California | 19 |
| ... | ... | ... | ... | ... |
| 110 | Japan | Sapporo | Hokkaido University | 1 |
| 111 | Japan | Tokyo | Asahi Kasei Corporation | 1 |
| 112 | Japan | Tokyo | Kitasato University | 1 |
| 113 | Japan | Tokyo | Tokyo Institute of Technology | 1 |
| 290 | United States of America | Yorktown Heights, NY | IBM Thomas J. Watson Research Center | 1 |
291 rows × 4 columns
burst = px.sunburst(country_city_org,
path=['organization_country', 'organization_city', 'organization_name'],
values='prize',
title='Where do Discoveries Take Place?',
)
burst.update_layout(xaxis_title='Number of Prizes',
yaxis_title='City',
coloraxis_showscale=False)
burst.show()
birth_years = df_data.birth_date.dt.year
df_data['winning_age'] = df_data.year - birth_years
display(df_data.nlargest(n=1, columns='winning_age'))
display(df_data.nsmallest(n=1, columns='winning_age'))
| year | category | prize | motivation | prize_share | laureate_type | full_name | birth_date | birth_city | birth_country | birth_country_current | sex | organization_name | organization_city | organization_country | ISO | share_pct | winning_age | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 937 | 2019 | Chemistry | The Nobel Prize in Chemistry 2019 | “for the development of lithium-ion batteries” | 1/3 | Individual | John Goodenough | 1922-07-25 | Jena | Germany | Germany | Male | University of Texas | Austin TX | United States of America | DEU | 0.33 | 97.00 |
| year | category | prize | motivation | prize_share | laureate_type | full_name | birth_date | birth_city | birth_country | birth_country_current | sex | organization_name | organization_city | organization_country | ISO | share_pct | winning_age | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 885 | 2014 | Peace | The Nobel Peace Prize 2014 | "for their struggle against the suppression of... | 1/2 | Individual | Malala Yousafzai | 1997-07-12 | Mingora | Pakistan | Pakistan | Female | NaN | NaN | NaN | PAK | 0.50 | 17.00 |
# Descriptive statistics
df_data.winning_age.describe()
count 934.00 mean 59.95 std 12.62 min 17.00 25% 51.00 50% 60.00 75% 69.00 max 97.00 Name: winning_age, dtype: float64
plt.figure(figsize=(8, 4), dpi=200)
sns.histplot(data=df_data,
x=df_data.winning_age,
bins=30)
plt.xlabel('Age')
plt.title('Distribution of Age on Receipt of Prize')
plt.show()
plt.figure(figsize=(8,4), dpi=200)
with sns.axes_style("whitegrid"):
sns.regplot(data=df_data,
x='year',
y='winning_age',
lowess=True,
scatter_kws = {'alpha': 0.4},
line_kws={'color': 'black'})
plt.show()
plt.figure(figsize=(8,4), dpi=200)
with sns.axes_style("whitegrid"):
sns.boxplot(data=df_data,
x='category',
y='winning_age')
plt.show()
with sns.axes_style('whitegrid'):
sns.lmplot(data=df_data,
x='year',
y='winning_age',
row = 'category',
lowess=True,
aspect=2,
scatter_kws = {'alpha': 0.6},
line_kws = {'color': 'black'},)
plt.show()
with sns.axes_style("whitegrid"):
sns.lmplot(data=df_data,
x='year',
y='winning_age',
hue='category',
lowess=True,
aspect=2,
scatter_kws={'alpha': 0.5},
line_kws={'linewidth': 5})
plt.show()